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Abstract 

Weighted low-rank approximation (WLRA) , a dimensionality reduction technique for data anal- 
ysis, has been successfully used in several applications, such as in collaborative filtering to design 
recommender systems or in computer vision to recover structure from motion. In this paper, we 
prove that computing an optimal weighted low-rank approximation is NP-hard, already when a 
rank-one approximation is sought. In fact, we show that it is hard to compute approximate solu- 
tions to the WLRA problem with some prescribed accuracy. Our proofs are based on reductions 
from the maximum-edge biclique problem, and apply to strictly positive weights as well as to binary 
weights (the latter corresponding to low-rank matrix approximation with missing data). 

Keywords: low-rank matrix approximation, weighted low-rank approximation, missing data, ma- 
trix completion with noise, PCA with missing data, computational complexity, maximum-edge 
biclique problem. 



1 Introduction 

Approximating a matrix with one of lower rank is a key problem in data analysis and is widely used 
for linear dimensionality reduction. Numerous variants exist emphasizing different constraints and 
objective functions, e.g., principal component analysis (PCA) |16j . independent component analysis [6], 
nonnegative matrix factorization [18] . and other refinements are often imposed on these models, e.g., 
sparsity to improve interpretability or increase compression [7J. 

In some cases, it may be necessary to attach a weight to each entry of the data matrix, expressing 
its relative importance [8]. This is for example the case in the following situations: 

o The matrix to be approximated is obtained via a sampling procedure and the number of samples 
and/or the expected variance vary among the entries. For example, it has been shown that using 
a weighted norm gives better results in 2-D digital filter design [19J and microarray data analysis 

EDI. 



o Some data is missing/unknown, which can be taken into account by assigning zero weights to 
the missing/unknown entries of the data matrix. This is for example the case in collaborative 
filtering, notably used to design recommender systems [23] (in particular, the Netflix prize com- 
petition has demonstrated the effectiveness of low-rank matrix factorization techniques [17]). or 
in computer vision to recover structure from motion [261 [15] , see also [3] . This problem is often 
referred to as PCA with missing data |26[ I13]. and can be viewed as a low-rank matrix completion 
problem with noise, i.e., approximate a given noisy data matrix featuring missing entries with a 
low-rank matrix^. 



1 Universite catholique de Louvain, CORE, B-1348 Louvain-la-Neuve, Belgium. E-mail: nicolas.gillis@uclouvain.be 
and francois.glineur@uclouvain.be. Nicolas Gillis is a research fellow of the Fonds de la Recherche Scientifique (F.R.S.- 
FNRS). This text presents research results of the Belgian Program on Interuniversity Poles of Attraction initiated by 
the Belgian State, Prime Minister's Office, Science Policy Programming. The scientific responsibility is assumed by the 
authors. 

x In our settings, the rank of the approximation is fixed a priori. 
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o A greater emphasis must be placed on the accuracy of the approximation on a localized part of 
the data, a situation encountered for example in image processing |14j Chapter 6]. 

Finding a low-rank matrix which is closest to the input matrix according to these weights is an 
optimization problem called weighted low-rank approximation (WLRA). Formally, it can be formulated 
as follows: first, given an m-by-n nonnegative weight matrix W G M™ xn , we define the weighted 
Frobenius norm of an m-by-n matrix A as ||.A||w = (Yli jWijAfj)? . Then, given an m-by-n real 
matrix M G M mxn and a positive integer r < min(m, n), we seek an m-by-n matrix X with rank at 
most r that approximates M as closely as possible, where the quality of the approximation is measured 
by the weighted Frobenius norm of the error: 



Since any m-by-n matrix with rank at most r can be expressed as the product of two matrices of 
dimensions m-by-r and r-by-n, we will use the following more convenient formulation featuring two 
unknown matrices U G ]R mxr an d V G R nxr but no explicit rank constraint: 



Even though (|WLRAj) is suspected to be NP-hard [13 [27], this has never, to the best of our knowledge, 
been studied formally. In this paper, we analyze the computational complexity in the rank-one case 
(i.e., for r = 1) and prove the following two results. 

Theorem 1. When M G {0, l} mxn ; and W G ]0, l] mxn ; it is NP-hard to find an approximate solution 
of rank- one (jWLRAp with objective function accuracy less than 2 _11 (mn)~ 6 . 

Theorem 2. When M G [0, l] mxn ; and W G {0, l} mxn ; %% is NP-hard to find an approximate solution 
of rank- one (jWLRAp with objective function accuracy less than 2 _12 (mn)~ 7 . 

In other words, it is NP-hard to find an approximate solution to rank-one (jWLRAp with positive 
weights, and to the rank-one matrix approximation problem with missing data. Note that these results 
can be easily generalized to any fixed rank r, see Remark [31 

The paper is organized as follows. We first review existing results about the complexity of (jWLRAp 
in Section [2j In Section 13.11 we introduce the maximum-edge biclique problem (MBP), which is 
NP-hard. In Sections 13.21 and 13.31 we prove Theorems [1] and [2] respectively, using polynomial-time 
reductions from MBP. We conclude with a discussion and some open questions. 

1.1 Notation 

The set of real m-by-n matrices is denoted M mxn , or M™ xn when all the entries are required to be 
nonnegative. For A G W mxn , we note A-j the j' th column of A, A{ : the i th row of A, and A{j or A(i,j) 
the entry at position (i,j); for b G IR mxl = M m , we note bi the i th entry of b. The transpose of A 
is A T . The Frobenius norm of a matrix A is defined as ||^4||^ = X^ij(^i) 2 ' ana - 1 1 -lb is the usual 
Euclidean norm with = 'Ylubf. For W G M^ ,,xn , the weighted Frobenius 'norm' of a matrix A 
is defined^] by ||^4||^/ = Wij(Aij) 2 . The m-by-n matrix of all ones is denoted l mX n> the m-by-n 
matrix of all zeros mxn , and /„ is the identity matrix of dimension n. The smallest integer larger or 
equal to x is denoted \x] . 

2 ||.||w is a matrix norm if and only if W > 0, else it is a semi-norm. 



p* = inf \\M — X\\w such that X has rank at most r. 

xeR mxn 



inf 

UeB. mxr ,V&M. nxr 




(WLRA) 
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2 Previous Results 



Weighted low-rank approximation is suspected to be much more difficult than the corresponding 
unweighted problem (i.e., when W is the matrix of all ones), which is efficiently solved using the 
singular value decomposition (SVD) [12J. In fact, it has been previously observed that the weighted 
problem might have several local minima which are not global [27J, while this cannot occur in the 
unweighted case (i.e., when W is the matrix of all ones), see, e.g., [14|, p. 29, Th.1.14]. 



Example 1. Let 



M 




and W 




In the case of a rank-one factorization (r = 1) and a nonnegative matrix M, one can impose without 
loss of generality that the solutions of (jWLRAp are nonnegative. In fact, one can easily check that 
any rank-one solution uv T of (|WLRAp can only be improved by taking its component-wise absolute 
value \uv T \ = |u||t>| T . Moreover, we can impose without loss of generality that \\u\\2 = 1, so that only 
two degrees of freedom remain. Indeed, for a given 



u(x,y) 



x 

y 



with 



x>0,y>0 
x 2 + y 2 < 1 



the corresponding optimal v*(x,y) = argmin^ \ \M — u(x,y)v q 



w 



can 



be computed easil^. Figure [7] 



displays the graph of the objective function \ \M — u(x, y)v*(x, y) T \ \w with respect to parameters x and 
y; we observe four local minima, close to (^,0), (0, -^), (0,0) and (-^,-^). We will see later 



in 








Figure 1: Objective function of (jWLRAp with respect to the parameters (x,y). 



Section [5] how this example has been generated. 



3 This problem can be decoupled into n independent quadratic programs in one variable, and admits the following 
closed-form solution: v*(x,y) — [(M oW) T u]/ .[W T (uou)], where o (resp. /.) is the component-wise multiplication (resp. 
division) . 
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However, if the rank of the weight matrix W 6 R™ xn is equal to one, i.e., W = st T for some 
s £ and t £ M™ , (jWLRAp can be reduced to an unweighted low-rank approximation. In fact, 



\M-UV T \\ 2 W = Y. W v( M - UvT )% = ^2s i t,(M -UV T ) 




Therefore, if we define a matrix M' such that M-j = sj srfj Mij Vi,j, an optimal weighted low-rank 
approximation (U, V) of M can be recovered from a solution (U',V) to the unweighted problem for 
matrix M' using U i: = U-./y/sl Vi and V j: = V-./^/t] Vj. 

When the weight matrix W is binary, WLRA amounts to approximating a matrix with missing 
data. This problem is closely related to low-rank matrix completion, see [2] and the references therein, 
which can be defined as 

min rank(X) such that = My for £ SI, (MC) 

where C {1, 2, . . . , to} x {1, 2, . . . , n} is the set of entries for which the values of M are known. 
(|MCp has been shown to be NP-hard [5], and it is clear that an optimal solution X* of (|MCj) can be 
obtained by solving a sequence of (|WLRAp problems with the same matrix M, with 



1 if(»,j)€fi 
otherwise 



and for different values of the target rank ranging from r = 1 to r = min(m, n). The smallest value of 
r for which the objective function \ \M — UV T \\y V of (jWLRAp vanishes provides an optimal solution 
for (|MCp . This observation implies that it is NP-hard to solve (jWLRAp for each possible value of 
r from 1 to min(m,n), since it would solve (jMCp . However, this does not imply that (jWLRAp is 
NP-hard when r is fixed, and in particular when r equals one. In fact, checking whether (jMCp admits 
a rank-one solution can be done easil}Q. 

Rank-one (jWLRAp can be equivalently reformulated as 

inf ||M - A\ \w such that rank(A) < 1, 

and, when W is binary, is the problem of finding, if possible, the best rank-one approximation of a 
matrix with missing entries. To the best of our knowledge, the complexity of this problem has never 
been studied formally; it will be shown to be NP-hard in the next section. 



Another closely related result is the NP-hardness of the structure from motion problem (SFM), in 
the presence of noise and missing data [21] . Several points of a rigid object are tracked with cameras 
(we are given the projections of the 3-D points on the 2-D camera planes jl, and the aim is to recover 
the structure of the object and the positions of the 3-D points. SFM can be written as a rank-four 
(jWLRAp problem with a binary weight matrbid [15]. However, this result does not imply anything on 
the complexity of rank-one (jWLRAp . 

An important feature of (jWLRAp is exposed by the following example. 

4 The solution X — uv T can be constructed observing that the vector u must be a multiple of each column of M. 
5 Missing data arise because the points may not always be visible by the cameras, e.g., in the case of a rotation. 
6 With the additional constraint that the last row of V must be all ones, i.e., V r: = lixn- 
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Example 2. Let 

"-(11 

where ? indicates that an entry is missing, i.e., that the weight associated with this entry is (1 
otherwise). Observe that \f(u,v) G M. m x M ra , 

rank(M) = 2 and rank{uv T ) = 1 =>• ||M — uv T \\w > 0. 

However, we have 

inf MM - uv T \\w = 0. 



AW 

(«,n)6t ra xR" 



In fact, one can check that with 



1 \ , , , ( 1 \ , ■ ' \Ti 



u(e) = I ^ J and u(e) = I J , we have lim | |M — u(e)v(e) \\w = 0. 

This indicates that, when has zero entries, the set of optimal solutions of (jWLRAp might be 
empty. In other words, the (bounded) infimum of the objective function might be unattained. On the 
other hand, the infimum is always attained for W > since \\.\\w is then a norm. 

For this reason, these two cases will be analyzed separately: in Section I3.2| we study the compu- 
tational complexity of the problem when W > 0, and, in Section \3.3\ the case of a binary W (i.e., the 
problem with missing data). 



3 Complexity of rank-one (IWLRAp 

In this section, we use polynomial-time reductions from the maximum-edge biclique problem to prove 
Theorems [T] and [2j 

3.1 Maximum-Edge Biclique Problem 

A bipartite graph is a graph whose vertices can be partitioned into two disjoint sets such that there 
is no edge between two vertices in the same set. The maximum-edge biclique problem (MBP) in a 
bipartite graph is the problem of finding a complete bipartite subgraph (a biclique) with the maximum 
number of edges. 

Let M 6 {0, l} mxn be the biadjacency matrix of a bipartite graph Gb = (V% U V%, E) with V\ = 
{si, . . . s m }, V 2 = {ti,... t n } and E C (Vi x V 2 ) , i.e., 

M i:j = l <=> ( Si ,tj)eE. 

The cardinality of E will be denoted \E\ = ||M|||i < mn. 

For example, Figure [2] displays the graph generated by the matrix M of Example [TJ 

51 *\ / tl 

52 tl 

53 * U 

Figure 2: Graph corresponding to the matrix M of Example [TJ 
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With this notation, the maximum-edge biclique problem in a bipartite graph can be formulated as 
follows [TT\ 

min | \M — uv T \ \"p 



u,v 



u;r, < AI;,. V/../ (MBP) 
«€{0,ir,«€{0,l} n , 

where Ui = 1 (resp. Vj = 1) means that node Sj (resp. tj) belongs to the solution, m = (resp. Vj = 0) 
otherwise. The first constraint guarantees feasible solutions of (jMBPp to be bicliques of Gb- In fact, 
it is equivalent to the implication 

Mij = => Ui = or vj = 0, 

i.e., if there is no edge between vertices Sj and tj, they cannot simultaneously belong to a solution. 
The objective function minimizes the number of edges outside the biclique, which is equivalent to 
maximizing the number of edges inside the biclique. Notice that the minimum of (jMBPp is \E\ — \E*\, 
where \E*\ denotes the number of edges in an optimal biclique. 
The decision version of the MBP problem: 

Given K, does Gb contain a biclique with at least K edges? 

has been shown to be NP-complete [23] in the usual Turing machine model [9], which is our framework 
in this paper. Therefore, computing \E\ — \E*\, the optimal value of (jMBPj) . is NP-hard. 

3.2 Low-Rank Matrix Approximation with Positive Weights 

In order to prove NP-hardness of rank-one (jWLRAp with positive weights (W > 0), let us consider 
the following instance: 

p* = min \\M -uv T \\l v , (W-ld) 

with M 6 {0, l} mxn the biadjacency matrix of a bipartite graph Gb = (V, E) and the weight matrix 
defined as 

where d > 1 is a parameter. 

Intuitively, increasing the value of d makes the zero entries of M more important in the objective 
function, which leads them to be approximated by small values. This observation will be used to show 
that, for d sufficiently large, the optimal value p* of (|W-ldp will be close to \E\ — \E*\, the optimal 
value of (IMBPl) (Lemma EJ. 

A maximal biclique in Gb is a biclique not contained in a larger biclique, and can be seen as a 
'locally' optimal solutions of (jMBPj) . We will show that, as the value of parameter d increases, the 
local minima of (jW-ldp get closer to binary vectors describing maximal bicliques in Gb- 

Example [T] illustrates the situation: the graph Gb corresponding to matrix M (cf. Figure[2|) contains 
four maximal bicliques {si,ss,ti,tz}, {s2, S3, £2, £3}, {S3, ti, t2, £3} and {si, S2, S3, t3}, and the weight 
matrix W that was used is similar to the case d = 100 in problem (jW-ldp . We now observe that (|W-ldp 

has four local optimal solutions as well (cf. Figured]) close to (^, 0), (0, (0, 0) and (^, ^). There 
is a one to one correspondence between these solutions and the four maximal bicliques listed above (in 
this order). For example, for (x, y) = (-^,0) we have u(x, y) = (-^, 0, , v*(x, y) is approximately 
equal to (v2, 0, V2) T , and this solution corresponds to the maximal biclique {si, S3, ti, £3}. 

Notice that a similar idea was used in [TO] to prove NP-hardness of the rank-one nonnegative 



T 

'vEI&t I \ 1VA ~ «' 

sufficiently large negative ones. 



factorization problem min ug Rm ^gjgn ||M — uv \\f, where the zero entries of M were replaced by 
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Remark 1 (Link with classical quadratic penalty method). It is worth noting that (|W-ldp can be 

viewed as the application of the classical quadratic penalty approach to the biclique problem, see, e.g., 
J2U §17.1]. In fact, defining F = {(i,j)\Mij = 1} and its complement F = {(i,j)\Mij = 0}, the 
biclique problem can be formulated as 

min (1 — UiVj) 2 such that u^Vj = for all (i,j) € F. (3-1) 

Indeed, in this formulation, it is clear that any optimal solution can be chosen such that vectors u and v 
are binary, from which the equivalence with problem (jMBPj) easily follows. Penalizing (quadratically) 
the equality constraints in the objective, we obtain 

Pd(u,v) = (l-UiVjf + d (UiVj) 2 , 

(i,i)e-P {ij)eF 

where d > is the penalty parameter. We now observe that our choice of W at the beginning of this 
section gives P^(u,v) = \\M — uv T \\ 2 v , i.e., (|W-ldp is exactly equivalent to minimizing P^{u,v). This 
implies that, as d grows, minimizers of problem (jW-ldj) will tend to solutions of the biclique problem 
(jMBPp . Our goal is now to prove a more precise statement about the link between these two problems: 
we provide (in Lemma\3±) an explicit value for d that guarantees a small difference between the optimal 
values of these two problems. 

First, we establish that for any (u,v) such that \ \M — ut> T ||^/ < \E\, the absolute value of the row 
or the column of uv T corresponding to a zero entry of M must be smaller than a constant inversely 
proportional to \fd. 

Lemma 1. Let (i,j) be such that Mij = 0, then \/(u,v) such that \ \M — uv T \\ 2 v < \E\, 



■ ( ii i iW V 4 ^! 2 

mm max \UiVh , max \u v Vn\\ < \ — : — . 

V KKn' feh l<p<m l P J 7 ~ V d 

Proof. Without loss of generality u and v can be scaled such that ||w||2 = 1Mb without changing 

"lb,,. „„j „, u„ „,/ _ /[Mia.. il./ii _ 1U./1 



the product uv , i.e., we replace u by u' = y ||^||| u and v by v' = y jffi^v so that ||it'||2 = \ \v'\\2 
\/\\u\\2\\v\\2 and u'v' T = uv T . First, observe that since \\.\\w is a norm, 

\\uv T \\ w - y/\E\ = \\uv T \\ W - \\M\\ W < \\M -uv T \\ w < y/\E\. 
Since all entries of W are larger than 1 (d > 1), we have 



^IklMh = \\uv \\f < \\uv \\w < y / 4|£'|, 



and then ||it||2 = H^lb < v^T^I- 

Moreover (i(0 — UiVj) 2 < \\M — uv T \\ 2 v < \E\, so that \uiVj\ < \l ^ which implies that either 

< y or < y Combining the above inequalities with the fact that (maxi<fc< n \vk\) and 



u 



(maxi< p < m are bounded above by ||n||2 = 1 1 f 1 1 2 < V^I^T completes the proof. □ 

We now prove the following general lemma which, combined with Lemma Q] above, will allow us 
to derive a lower bound on the objective function of (|W-ldp (it will also be used for the proof of the 
problem with missing data in Section [3.3p . 
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Lemma 2. Let M E {0, l} mxn &e the biadjacency matrix of a bipartite graph Gb = (V, E), W £ M 
a weight matrix such that W%j = 1 /or eac/i pair satisfying M{j = 1, and (u, v) be such that 



mini max \uiVk\, max livud ) < c, (3.2) 
V l<fc<n l<p<m J J 

for each pair (i,j) satisfying Mij = 0, where < c < 1. Let also p = \E\ — \E*\ be the optimal objective 
function value of (jMBPp , Then, if p > ; we have 

\\M - uv T \\ w > p(l ~ 2c). 

Proof. Define the biclique corresponding to the following set Q c (u, v) C {1, 2, . . . , m} x {1, 2, . . . , n} 

O c («,i;) = { i | 3j s.t. > c} x {j | 3i s.t. > c}. 

This biclique is part of the original graph, i.e., every edge in 0, c (u, v) belongs to G^. Indeed, if Mij = 0, 
the pair (i,j) cannot belong to Q c (u,v) since, by Equation (|3.2p . the absolute value of either the i th 
row or the j th column of uv T is smaller than c. By construction, we also have that the entries 
corresponding to pairs not in the biclique Q c (u,v) are approximated by values smaller than c. 
The error corresponding to a unit entry of Mjj not in the biclique Q c (u,v) is then at least (1 — c) 2 
(because the corresponding weight Wij is equal to one). Since there are at least p = \E\ — \E*\ such 
entries (because there are \E\ unit entries in M and at most \E*\ pairs in biclique Q c (u,v)), we have 

||M - uv T \\w > (1 - cfp > p(l - 2c) =p - 2pc. 

□ 

We can now provide lower and upper bounds on the optimal value p* of (|W-ld|) , and show that it 
is not too different from the optimal value \E\ — \E*\ of (jMBPj) . 

Lemma 3. Let < e < 1. For any value of parameter d such that d > — the optimal value p* of 
(jW-ldj) satisfi es 

\E\ - \E*\ -e <p* < \E\ - \E*\. 

Proof. Let (u, v) be an optimal solution of (| W-ld|) (since W > 0, there always exists at least one 
optimal solution, cf. Section[2]), and let us note p = \E\ — \E* \ > 0. If p = 0, then p* = and the result 
is trivial (it is the case when the rank of M is one, i.e., contains only one biclique). Otherwise, 
since any optimal solution of (|MBPj) plugged in (|W-ldj) achieves an objective function equal to p, we 
must have 

p* = \\M - uv T \\l v <p=\E\- \E*\, 

which gives the upper bound. 

Since d is greater than 4\E\ 2 for any < e < 1, the constant a = \J appearing in Lemma [T] 
is smaller than one. This means that Lemma [2] is applicable, so that we have 

\\M - w T ||h/ > p - lap >p - 2a\E\ >p-e, 

which gives the lower bound (the last inequality follows from the fact that 2a|S| < e is equivalent to 
the condition d > 2 ffi ). □ 

This result implies that for e = 1, i.e., for d > (2|£'|) 6 , we have \E\ - \E*\ - 1 < p* < \E\ — \E*\, 
and therefore computing p* exactly would allow to recover \E*\ (since \E*\ = \E\ — \p*~\), which is 
NP-hard. Since the reduction from (|MBP|) to ()W-ldp is polynomial (it uses the same matrix M and a 
weight matrix W whose description has polynomial length), we conclude that solving (| W-ld|) exactly 
is NP-hard. The next result shows that even solving (j W-ldj) approximately is NP-hard. 
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Corollary 1. For any d > (2mnf, M € {0, l} mxn and W G {l,d} mxn , it is NP-hard to find an 
approximate solution of rank- one (jWLRAp with objective function accuracy less than 1 — — • 

Proof. Let d > (2ran) 6 , < e = 1 < 1, and (u,v) be an approximate solution of (jW-ldjl with 

objective function accuracy (1 — e), i.e., p* < p = \ \M — u^W^y < p* + 1 — e. Since d = ly2m ^ 1 > ( 2 I^D ; 
Lemma [3] applies and we have 

\E\ - \E*\ - e < p* < p < p* + 1 - e < \E\ - \E*\ + 1 - e. 

We finally observe that knowing p allows to recover which is NP-hard. In fact, adding e to the 
above inequalities gives \E\ — \E*\ < p + e < \E\ — \E* \ + 1, and therefore 



\E*\ = \E\ 



p + e 



+ 1. 

□ 



We are now in position to prove Theorem [H which deals with the hardness of rank-one (WLRA) 
with bounded weights. 

Theorem^ Let us use Corollary ffl with W G {1, d} mxn , and define W' = \W G {\, l} mxn . Clearly, 
replacing by W in ()W-ldp simply amounts to multiplying the objective function by -j, with 
\\M — uv T \\'y VI = h\\M — uv T \\y V . Taking d 1 / 4 = 2(2mn) 3 / 2 in Corollary [Q we obtain that for 
M G {0, l} mxn and W G]0, l] mxn , it is NP-hard to find an approximate solution of rank-one (jWLRAj) 

with objective function accuracy less than 4(1 — ^ 2 "!"A — ) = 4-j = 2~ n (mn)~ 6 . □ 



#74 J - 2d 

Remark 2. The above bounds on d have been crudely estimated, and can be improved. Our main goal 
here was to show existence of a polynomial-time reduction from (jMBPp to rank-one (jWLRAp . 

Remark 3. Using the same construction as in Theorem 3], this rank-one NP-hardness result can 
be generalized to any factorization rank, i.e., approximate (jWLRAp for any fixed rank r is NP-hard. 
The idea is the following: given a bipartite graph Gj, with biadjacency matrix M G {0, l} mxn , we 
construct a larger bipartite graph G' b which is made of r disconnected copies of Gb, whose biadjacency 
matrix is therefore given by 



M' 



f M mX n • • • mX n ' 
Omxn M mX n 

V o mxre ... M j 



€{0,1}' 



Clearly, no biclique in this graph can be larger than a maximum biclique in Gf,, and there are (at least) 
r disjoint bicliques with such maximum size in G' b . Letting (U,V) G W mxr x K rxrn be an optimal 
solution of the rank-r (jWLRAp problem with M' above and weights W = M' -\-d(\ rm xm ~ 

M') defined 

as before, it can be shown that, for d sufficiently large, each rank-one factor U±V^ must correspond 
to a maximum biclique of Gb. 

3.3 Low-Rank Matrix Approximation with Missing Data 

The above NP-hardness proof does not cover the case when W is binary, corresponding to missing data 
in the matrix to be approximated (or to low-rank matrix completion with noise). This corresponds to 
the following problem 

inf \\M -UV T \\ 2 W = Y'WiAM -UV T )l , W G {0, i} mx ". 

E/6K mxr ,Vel nxr ^— ' 



9 



In the same spirit as before, we consider the following rank-one version of the problem 

p = 

with input data matrices M and W denned as follows 



inf 



\M-uv T \\ 2 w , 



(MD-ld) 



M 



M b 


O s xZ 


Ozxt 


dl z 



and W 







B 2 


Iz 



where M b E {0,1} is the biadjacency matrix of the bipartite graph G b = (V,E), d > 1 is a 
parameter, Z = st — \E\ is the number of zero entries in M b , and m = s + Z and n = t + Z are the 
dimensions of M and W . 

Binary matrices B\ E {0, l} sxZ and B 2 E {0, iy Zxt are constructed as follows: assume the Z zero 
entries of Mf, can be enumerated as 

{M b (ii,h), M b (i 2 ,j 2 ), ■ ■ .,M b {i z ,jz)}, 

and let kij be the (unique) index k (1 < k < Z) such that (ik,jk) = (hj) (therefore fcjj is only defined 
for pairs such that M b (i,j) = 0, and establishes a bijection between these pairs and the set 

{1, 2, . . . , Z}). We now define matrices B\ and B 2 as follows: for every index 1 < kij < we have 

Bi(i,kij) = l,Bi(i',kij) = Vi' ^ i and B 2 (%,i) = l,B 2 (kij,f) = V/ / j . 

Equivalently, each column of -Bi (resp. row of i?2) corresponds to a different zero entry M b (i,j), and 
contains only zeros except for a one at position i within the column (resp. at position j within the 
row). Hence the matrix B\ (resp. B 2 ) contains only zero entries except Z entries equal to one, one in 
each column (resp. row). 

In the case of Example [H we get 



M 



1 1 




\ 


1 1 


03x2 




1 1 1 






02x3 


dl 2 


) 



and W 





1 


1-3x3 


1 







10 


h 


1 



i.e., the matrix to be approximated can be represented as 



/ 1 





1 





? > 





1 


1 


9 





1 


1 


1 


9 


? 


? 





9 


d 


? 


V o 


9 


9 


? 


d J 



(3.3) 



For any feasible solution (u, v) of (|MD-ldp , we also note 

JeR™ E R s and E R z , 



E M n , E 



and 



E 



We will show that this formulation ensures that, as d increases, the zero entries of matrix M b (the 
biadjacency matrix of G b which appears as the upper left block of matrix M) have to be approximated 
with smaller values. Hence, as for (|W-ldp . we will be able to prove that the optimal value p* of 



10 



(|MD-ldj) will have to get close to the minimal value \E\ — \E*\ of (jMBPj) . implying NP-hardness of 
its computation. 

Intuitively, when d is large, the lower right matrix dlz of M will have to be approximated by 
a matrix with large diagonal entries, since they are weighted by unit entries in matrix W . Hence 

.v^" . has to be large for all 1 < ha < Z. We then have at least either ui \ or v^ 1 . large for all 
kij (recall that each kij corresponds to a zero entry in M at position (i,j), cf. definition of B\ and 
B2 above). By construction, we also have two entries M(s + kij,j) = and M(i,t + kij) = with 
unit weights corresponding to the nonzero entries Bi(i, kij) and B2(kij,j), which then also have to be 
approximated by small values. If u k . (resp. v^ 1 .) is large, then Vj (resp. u\ ) will have to be small 

since u ^.vp ~ (resp. u f v^. « 0). Finally, either up or Vj has to be small, implying that Mb(i,j) 
is approximated by a small value, because (it' 6 ), v^) can bounded independently of the value of d. 

We now proceed as in Section 13.21 Let us first give an upper bound for the optimal value p* of 
(TMD-ldp . 

Lemma 4. For d > 1, the optimal value p* of (|MD-ld|) is bounded above by \E\ — \E*\, i.e., 

p* = inf ||M-ra T ||L < \E\ - \E*\. (3.4) 

u£R m ,v£M. n 

Proof. Let us build the following feasible solution (it, v) of (jMD-ldj) : (i/ 6 ),<y( 6 )) is a (binary) optimal 
solution of (jMBPp and (u^ d \v^) is defined ad3 

M / dK if v f = °> a M / ^ if «f = 0. * 

% = if |) = 1) and %={ d K if v fe = 1) M 

where K is a real parameter and kij is the index of the column of B\ and the row of B2 corresponding 
to the zero entry (i,j) of M b (i.e., (i,j) = (i k .. , j k ..)). 
We have that 



(uv T ) o W 



u (b) v (b) T D 



D 2 




where o is the component-wise (or Hadamard) product between two matrices, and matrices D\ and 
D2 satisfy 

Di(l,p) = XB i (l,p) = 0, 

Di(l,p) G {0,^} tiBi(!,p) = l, % 

In fact, let us analyze the four blocks of (uv T ) o W: 

1. Upper-left: the upper-left block of W and uv T are respectively the all-one matrix and 

2. Lower-right: since the lower-right block of W is the identity matrix, we only need to consider 
the diagonal entries of the lower-right block of uv T , which are given by u k v k , = d for kij = 
1, 2, . . . , Z, cf. Equation (|3.5p . 

3. Upper-right and lower-left: by definition of B\ and B2, the only entries of D\ and D 2 which may 
be different from zero are given by 

Di(»,fcy) = ul b) v ( k f. and D 2 (%,i) = %V> 



7 Notice that this construction is not symmetric, and the variant using instead of i/ 6 ' to define and is 
also possible. 
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for all such that Mf,(i,j) = 0. By construction, we have either = or v 



,(*>) 



1. If 



,(*>) 



0, then Vfc. = d 1 K by Equation (|3,5p and we have D\(i,kij) 



and D2(kij,j) = 0. If u 



(&) 



1, we have u 



(&) 



(since Mb(i,j) 



Equation (|3.5p whence D\(i,kij) = and D2(kij,j) = d 



l-K 



(6) (d) 
0) and uf^ 



€ {O.d 1 -*} 
= d l - K by 



Finally, D\ and Z?2 have at most Z non-zero entries (recall Z is the number of zero entries in Mb), 
which are all equal to d x ~ K \ therefore, 



p* < \\M - uv T \\ 2 v < \E\ -\E*\ + 2Zd 2{l ~ K \ VK. 
Since d > 1, taking the limit K — > +oo gives the result. 



(3.6) 
□ 



Example 3. Let us illustrate the construction of Lemma [^] on the matrix from Example [7J which 
contains two maximum bicliques with 4 edges, including the one corresponding to vfi^ = (0, 1, 1) T and 
= (0,1, 1) T . Taking u= (0, 1, 1, d 1 ~ K , d K ) T and v = (0, 1, 1, d K , d 1 ~ K ) T , we obtain 



1 





1 





? \ 




^ 














\ 





1 


1 


? 










1 


1 


d^ 






1 


1 


1 


? 


? 


WTO = 





1 


1 


d^ 






? 





? 


d 


? 







d l - K 


d}- K 


d 









? 


? 


? 






\o 


d K 


d K 


d 2K 


d 


) 



T\\2 



It" 



(|£| - |i?*|) + 2d 2 ( 1 - K ) = 3 + 2d 2 ( 1 "^), icfccfr is /ess toon the bound 3 + 4d 2 ^- K ^ 



with 1 1 M — uv 
guaranteed by Equation (|3.6|) . 

We now prove a property similar to Lemma [1] for any solution with objective value smaller that 

\E\. 

Lemma 5. Let d > \f\E\ and (i,j) be such that Mb(i,j) = 0, then the following holds for any pair 
(u,v) such that \\M — uv T \\ 2 v < \E\: 



mm I max max \u p Vj 

l<k<n l<p<m 



< 



V2\E\ 



(3.7) 



Proof. Without loss of generality we set ||u^||2 
Observing that 



(d-vw\r 2 

II2 by scaling u and v without changing uv T . 



I« (6) ||2||f (6) ||2 



\u^v^ T \\ F 



we have \\u^\ 



2 v 



(6)1 



< 2a/|E|, and ||u 



(6)1 



\M b \\ F < \\M b 

< \\M-uv 1 

j^\\ 2 < y/2\E\*. 



U (%® T \\ F 



U" 



< 



Assume M b (i,j) is zero for some pair and let k = k%j denote the index of the corresponding 

column of B\ and row of B2 (i.e., such that B\{i, k) = B2(k,j) = 1). By construction, vfj^v^ has to 
approximate d in the objective function. This implies (w^ — d) 2 < \E\ and then 

.WL(<9 



ul'v^ k '>d 



E > 0. 



Suppose is greater than |i>£ | (the case where l^j^l is greater than |tti | is similar), which implies 
1*4^1 > (d— I i?! 2)2. Moreover, since ^(/c, j) is a unit weight, we have that u^Vj has to approximate 
zero in the objective function, implying 

(uf Vj - 0) 2 < \E\ = 



I {d) I , 



El. 
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Hence 

, VW\ \E\* 
\ y i\ ^ 717)7 ^ ~ ^T> ( 3 - 



and since (maxi< p < m \u p \) is bounded by ||u^||2 < V^l-^l 4 , the proof is complete. □ 
Using Lemma [21 we can now derive a lower bound for the value of p* . 



7 



81 El ^ ^ 

Lemma 6. Let < e < 1. For any value of parameter d strictly greater than -4 — h|-E| 5 , the infimum 
p* of (IMD-ldp satisfies 

\E\ - \E*\ -e<p*. 

Proof. Let us note p = \E\ — \E*\. If p = 0, the result is trivial since p* = 0. Otherwise, suppose 

3 7 

P* < P — e and let /3 = 2_L§1L_. pj rs ^ b serV e that d > + |i?|2 is equivalent to 2|S|/3 < e. 

Then, by continuity of ([MD-ldp . for any 5 such that 5 < e, there exists a pair (u, w) such that 

\\M b -u^v^ T \\ 2 w < \\M -uv T \\ 2 w <p-5< \E\. 

3 1 

In particular, let us take 5 = 2\E\(3 < e. Observe that {3 < 1 as soon as d > 2\E\z + \E\z (which is 
guaranteed because < e < 1). By Lemma [5] and Lemma [2] (applied on matrix M b and the solution 
(m^,!)^)), we then have 

p-2/3p< \\M b -u {b) v {h)T \\ 2 w < \\M-uv T \\ly <p-5. 

Dividing the above inequalities by p > 0, we obtain 

l-2/3<l--<l- T = r =><5< 2\E\P, 
p \E\ 

a contradiction. □ 

Corollary 2. For any d > 8(mn) 7 / 2 + jmn, AT G {0, 1, d} mxn , and TU G {0, l} mx ™, # is NP-hard to 
find an approximate solution of rank-one (jWLRAj) with objective function accuracy 1 — jj^^=-^ . 

Proof. Let d > 8(mn) 7 / 2 + ^mn, < e = ^^=^5- < 1, and (u,v) be an approximate solution 

of (jW-ldj) with absolute error (1 — e), i.e., p* < p = \\M — iiv T \\yy < p* + 1 — e. Lemma [6] applies 

because d = 8 ( m ^ + ^Jmn > 8 ^ / +^fsi > 8 ^ 71 + l-E^ 1 / 2 - Using Lemmas [Hand El the rest of the 
proof is identical as the one of Theorem [TJ Since the reduction from ([MBPp to (|MD-ldp is polynomial 
(description of matrices W and M has polynomial length, since the increase in matrix dimensions 
from M b to M is polynomial), we conclude that finding such an approximate solution for ([MD-ldj) is 
NP-hard. □ 

We can now easily derive Theorem [21 which deals with the hardness of rank-one (WLRA) with a 
bounded matrix M. 

Theorem Replacing M by M' = ±M in (|MD-ldj) gives an equivalent problem with objective func- 
tion multiplied by 4-, since 4y||M — uf T || 2 y = \ \M' — ^-\\xv- Taking d = 2 5 (mn) 7 / 2 + y/nvn in Corol- 
lary [2l we find that it is NP-hard to compute an approximate solution of rank-one (jWLRAp for M G 
[0, l] mxn and W G {0, l} mxn , and with objective function accuracy less than ^-^1 — 
2^ > 2~ 12 (mn)- 7 . □ 
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4 Concluding Remarks 



In this paper, we have studied the complexity of the weighted low-rank approximation problem 
(WLRA), and proved that computing an approximate solution with some prescribed accuracy is NP- 
hard, already in the rank-one case, both for positive and binary weights (the latter also corresponding 
to low-rank matrix completion with noise, or PC A with missing data). 

The following more general problem is sometimes also referred to as WLRA: 

inf \\M-UV\\f P) , (4.1) 

UeM mxr ,VeM rXn y ' 

where ||j4||?px = vec(A) T Pvec(A) , with vec(A) a vectorization of matrix A and P an mn-hy-mn posi- 
tive semidefinite matrix, see |25J and the references therein. Since our WLRA formulation corresponds 
to the special case of a diagonal (nonnegative) P, our hardness results also apply to Problem (|4.ip . 

It is also worth pointing out that, when the data matrix M is nonnegative, any optimal solution 
to rank-one (jWLRAp can be assumed to be nonnegative (see discussion for Example [T]). Therefore, all 
the complexity results of this paper apply to the weighted nonnegative matrix factorization problem 
(weighted NMF), which is the following low-rank matrix approximation problem with nonnegativity 
constraints on the factors 

min \\M-UV T \\ 2 W such that U > 0, V > 0. 

UeR mxr ,VeR nxr 

Hence, it it is NP-hard to find an approximate solution to rank-one weighted NMF (used, e.g., in 
image processing |14t Chapter 6]) and to rank-one NMF with missing data (used, e.g., for collaborative 
filtering [3]). This is in contrast with unweighted rank-one NMF, which is polynomially solvable (e.g., 
taking the absolute value of the first rank-one factor generated by the singular value decomposition) . 
Note that (unweighted) NMF has been shown to be NP-hard when r is not fixed [29] (i.e., when r is 
part of the input). 

Nevertheless, many questions remain open, including the following: 

o Our approximation results are rather weak. In fact, they require the objective function accuracy 
to increase with the dimensions of the input matrix, in proportion with (mn)~ 6 , which is some- 
what counter-intuitive. The reason is twofold: first, independently of the size of the matrix, we 
needed the objective function value of approximate solutions of problems (|W-ld|) and (|MD-ld|) 
to be no larger than the objective function of the optimal biclique solution plus one (in order 
to obtain \E*\ by rounding). Second, parameter d in problems (|W-ldp and (|MD-ld|) depends 
on the dimensions of matrix M. Therefore, when matrices W or M are rescaled between 
and 1, the objective function accuracy is affected by parameter d, and hence decreases with the 
dimensions of matrix M. Strengthening of these bounds is a topic for further research. 

o Moreover, as pointed out to us, these results say nothing about the hardness of approximation 
within a constant multiplicative factor. It would then be interesting to combine our reduc- 
tions with inapproximability results for the biclique problem (which have yet to be investigated 
thoroughly, see, e.g., [28j). or construct reductions from other problems. 

o When W is the matrix of all ones, WLRA can be solved in polynomial-time. We have shown 
that, when the ratio between the largest and the smallest entry in W is large enough, the problem 
is NP-hard (Theorem [1]). It would be interesting to investigate the gap between these two facts, 
i.e., what is the minimum ratio between the entries of W that leads to an NP-hard WLRA 
problem? 
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o When rank(IU) = 1, WLRA can be solved in polynomial-time (cf. Section [2]) while it is NP-hard 
for a general matrix W (with rank up to min(m, n)). What is the complexity of (jWLRAp if the 
rank of the weight matrix W is fixed and greater than one, e.g., if rank(VU) = 2? 

o When data is missing, the rank-one matrix approximation problem is NP-hard in general. Nev- 
ertheless, it has been observed [1] that when the given entries are sufficiently numerous, well- 
distributed in the matrix, and affected by a relatively low level of noise, the original uncorrupted 
low-rank matrix can be recovered accurately, with a technique based on convex optimization 
(minimization of the nuclear norm of the approximation, which can be cast as a semidefinite 
program). It would then be particularly interesting to analyze the complexity of the problem 
given additional assumptions on the data matrix, for example on the noise distribution, and deal 
in particular with situations related to applications. 
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